智能论文笔记

Three Characteristics of Successful Symbiotes: Observing the Evolution of Symbiosis in Silico

Peter D. Turney

分类：神经与进化计算

2021-04-02

在过去的工作中，我们开发了一个基于Conway的生活游戏的共生实体（Model-S）演变的计算模型。在本文中，我们研究了三个趋势，即生物学家在共生的演变中观察到。（1）管理：如果一个合作伙伴能够控制共生关系，则该控制可以减少冲突;因此，进化选择有利于有经理的辛酸人。（2）共同主义：虽然共生的合作伙伴往往具有相互冲突的需求，但进化选择有利于伴侣之间的合作。（3）相互作用：共生中的合作伙伴中的反复互动趋于促进进化选择引起的越来越容易。我们向Model-S添加了仪器，允许我们进行详细的测量，以了解在模拟中可以观察到三种趋势。当我们通过它的儿童人数来衡量一个合作伙伴的健身时，我们发现Fitter Symbiotes的管理，相互主义和互动比不那么贴合的共生体。这些结果证实了生物学家本质上观察到的趋势。 Model-S允许生物学家以不具有生物体的方式研究这些进化趋势和共生的其他特征。

translated by 谷歌翻译

Scalable Multi-Agent Reinforcement Learning for Warehouse Logistics with Robotic and Human Co-Workers

Aleksandar Krnjaic , Jonathan D. Thomas , Georgios Papoudakis , Lukas Schäfer , Peter Börsting , Stefano V. Albrecht

分类：机器学习 | 人工智能 | 机器人

2022-12-22

This project leverages advances in multi-agent reinforcement learning (MARL) to improve the efficiency and flexibility of order-picking systems for commercial warehouses. We envision a warehouse of the future in which dozens of mobile robots and human pickers work together to collect and deliver items within the warehouse. The fundamental problem we tackle, called the order-picking problem, is how these worker agents must coordinate their movement and actions in the warehouse to maximise performance (e.g. order throughput) under given resource constraints. Established industry methods using heuristic approaches require large engineering efforts to optimise for innately variable warehouse configurations. In contrast, the MARL framework can be flexibly applied to any warehouse configuration (e.g. size, layout, number/types of workers, item replenishment frequency) and the agents learn via a process of trial-and-error how to optimally cooperate with one another. This paper details the current status of the R&D effort initiated by Dematic and the University of Edinburgh towards a general-purpose and scalable MARL solution for the order-picking problem in realistic warehouses.

translated by 谷歌翻译

I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation

Chandra Bhagavatula , Jena D. Hwang , Doug Downey , Ronan Le Bras , Ximing Lu , Keisuke Sakaguchi , Swabha Swayamdipta , Peter West , Yejin Choi

分类：自然语言处理

2022-12-19

Pre-trained language models, despite their rapid advancements powered by scale, still fall short of robust commonsense capabilities. And yet, scale appears to be the winning recipe; after all, the largest models seem to have acquired the largest amount of commonsense capabilities. Or is it? In this paper, we investigate the possibility of a seemingly impossible match: can smaller language models with dismal commonsense capabilities (i.e., GPT-2), ever win over models that are orders of magnitude larger and better (i.e., GPT-3), if the smaller models are powered with novel commonsense distillation algorithms? The key intellectual question we ask here is whether it is possible, if at all, to design a learning algorithm that does not benefit from scale, yet leads to a competitive level of commonsense acquisition. In this work, we study the generative models of commonsense knowledge, focusing on the task of generating generics, statements of commonsense facts about everyday concepts, e.g., birds can fly. We introduce a novel commonsense distillation framework, I2D2, that loosely follows the Symbolic Knowledge Distillation of West et al. but breaks the dependence on the extreme-scale models as the teacher model by two innovations: (1) the novel adaptation of NeuroLogic Decoding to enhance the generation quality of the weak, off-the-shelf language models, and (2) self-imitation learning to iteratively learn from the model's own enhanced commonsense acquisition capabilities. Empirical results suggest that scale is not the only way, as novel algorithms can be a promising alternative. Moreover, our study leads to a new corpus of generics, Gen-A-Tomic, that is of the largest and highest quality available to date.

translated by 谷歌翻译

HACA3: A Unified Approach for Multi-site MR Image Harmonization

Lianrui Zuo , Yihao Liu , Yuan Xue , Blake E. Dewey , Murat Bilgel , Ellen M. Mowry , Scott D. Newsome , Peter A. Calabresi , Susan M. Resnick , Jerry L. Prince

分类：计算机视觉

2022-12-12

The lack of standardization is a prominent issue in magnetic resonance (MR) imaging. This often causes undesired contrast variations due to differences in hardware and acquisition parameters. In recent years, MR harmonization using image synthesis with disentanglement has been proposed to compensate for the undesired contrast variations. Despite the success of existing methods, we argue that three major improvements can be made. First, most existing methods are built upon the assumption that multi-contrast MR images of the same subject share the same anatomy. This assumption is questionable since different MR contrasts are specialized to highlight different anatomical features. Second, these methods often require a fixed set of MR contrasts for training (e.g., both Tw-weighted and T2-weighted images must be available), which limits their applicability. Third, existing methods generally are sensitive to imaging artifacts. In this paper, we present a novel approach, Harmonization with Attention-based Contrast, Anatomy, and Artifact Awareness (HACA3), to address these three issues. We first propose an anatomy fusion module that enables HACA3 to respect the anatomical differences between MR contrasts. HACA3 is also robust to imaging artifacts and can be trained and applied to any set of MR contrasts. Experiments show that HACA3 achieves state-of-the-art performance under multiple image quality metrics. We also demonstrate the applicability of HACA3 on downstream tasks with diverse MR datasets acquired from 21 sites with different field strengths, scanner platforms, and acquisition protocols.

translated by 谷歌翻译

Self-Destructing Models: Increasing the Costs of Harmful Dual Uses in Foundation Models

Eric Mitchell , Peter Henderson , Christopher D. Manning , Dan Jurafsky , Chelsea Finn

分类：机器学习

2022-11-27

A growing ecosystem of large, open-source foundation models has reduced the labeled data and technical expertise necessary to apply machine learning to many new problems. Yet foundation models pose a clear dual-use risk, indiscriminately reducing the costs of building both harmful and beneficial machine learning systems. To mitigate this risk, we propose the task blocking paradigm, in which foundation models are trained with an additional mechanism to impede adaptation to harmful tasks while retaining good performance on desired tasks. We call the resulting models self-destructing models, inspired by mechanisms that prevent adversaries from using tools for harmful purposes. We present an algorithm for training self-destructing models leveraging techniques from meta-learning and adversarial learning, showing that it can largely prevent a BERT-based model from learning to perform gender identification without harming the model's ability to perform profession classification. We conclude with a discussion of future directions.

translated by 谷歌翻译

SPICE, A Dataset of Drug-like Molecules and Peptides for Training Machine Learning Potentials

Peter Eastman , Pavan Kumar Behara , David L. Dotson , Raimondas Galvelis , John E. Herr , Josh T. Horton , Yuezhi Mao , John D. Chodera , Benjamin P. Pritchard , Yuanqing Wang

分类：机器学习

2022-09-21

机器学习潜力是分子模拟的重要工具，但是由于缺乏高质量数据集来训练它们的发展，它们的开发阻碍了它们。我们描述了Spice数据集，这是一种新的量子化学数据集，用于训练与模拟与蛋白质相互作用的药物样的小分子相关的潜在。它包含超过110万个小分子，二聚体，二肽和溶剂化氨基酸的构象。它包括15个元素，带电和未充电的分子以及广泛的共价和非共价相互作用。它提供了在{\ omega} b97m-d3（bj）/def2-tzVPPD理论水平以及其他有用的数量（例如多极矩和键阶）上计算出的力和能量。我们在其上训练一组机器学习潜力，并证明它们可以在化学空间的广泛区域中实现化学精度。它可以作为创建可转移的，准备使用潜在功能用于分子模拟的宝贵资源。

translated by 谷歌翻译

Spatial-Temporal Deep Embedding for Vehicle Trajectory Reconstruction from High-Angle Video

Tianya T. Zhang Ph. D. , Peter J. Jin Ph. D. , Han Zhou , Benedetto Piccoli , Ph. D

分类：计算机视觉 | 人工智能

2022-09-17

基于时空的图（STMAP）方法显示出为车辆轨迹重建处理高角度视频的巨大潜力，可以满足各种数据驱动的建模和模仿学习应用的需求。在本文中，我们开发了时空深嵌入（STDE）模型，该模型在像素和实例水平上施加了平等约束，以生成用于STMAP上车辆条纹分割的实例感知嵌入。在像素级别上，每个像素在不同范围的8-邻居像素进行编码，随后使用该编码来指导神经网络学习嵌入机制。在实例级别上，歧视性损耗函数被设计为将属于同一实例的像素更接近，并将不同实例的平均值分开。然后，通过静脉 - 沃特算法算法优化时空亲和力的输出，以获得最终的聚类结果。基于分割指标，我们的模型优于其他五个用于STMAP处理的基线，并在阴影，静态噪声和重叠的影响下显示出稳健性。该设计的模型用于处理所有公共NGSIM US-101视频，以生成完整的车辆轨迹，表明具有良好的可扩展性和适应性。最后但并非最不重要的一点是，讨论了带有STDE和未来方向的扫描线方法的优势。代码，STMAP数据集和视频轨迹在在线存储库中公开可用。 github链接：shorturl.at/jklt0。

translated by 谷歌翻译

Ontologizing Health Systems Data at Scale: Making Translational Discovery a Reality

Tiffany J. Callahan , Adrianne L. Stefanski , Jordan M. Wyrwa , Chenjie Zeng , Anna Ostropolets , Juan M. Banda , William A. Baumgartner Jr. , Richard D. Boyce , Elena Casiraghi , Ben D. Coleman

分类：人工智能

2022-09-10

通用数据模型解决了标准化电子健康记录（EHR）数据的许多挑战，但无法将其集成深度表型所需的资源。开放的生物学和生物医学本体论（OBO）铸造本体论提供了可用于生物学知识的语义计算表示，并能够整合多种生物医学数据。但是，将EHR数据映射到OBO Foundry本体论需要大量的手动策展和域专业知识。我们介绍了一个框架，用于将观察性医学成果合作伙伴关系（OMOP）标准词汇介绍给OBO铸造本体。使用此框架，我们制作了92,367条条件，8,615种药物成分和10,673个测量结果的映射。域专家验证了映射准确性，并且在24家医院进行检查时，映射覆盖了99％的条件和药物成分和68％的测量结果。最后，我们证明OMOP2OBO映射可以帮助系统地识别可能受益于基因检测的未诊断罕见病患者。

translated by 谷歌翻译

Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset

Peter Henderson , Mark S. Krass , Lucia Zheng , Neel Guha , Christopher D. Manning , Dan Jurafsky , Daniel E. Ho

分类：自然语言处理

2022-07-01

大语言模型的兴起的一个关注点是它们可能造成重大伤害的潜力，尤其是在偏见，淫秽，版权和私人信息方面进行预处理。新兴的道德方法试图过滤预处理的材料，但是这种方法是临时的，未能考虑到上下文。我们提供了一种以法律为基础的过滤方法，该方法直接解决了过滤材料的权衡。首先，我们收集并提供了一堆法律，这是一个256GB（以及增长）的开源英语法律和行政数据数据集，涵盖法院意见，合同，行政规则和立法记录。对一堆法律进行预处理可能有助于解决有望改善司法接触的法律任务。其次，我们提炼政府已制定的法律规范将有毒或私人内容限制为可行的研究人员，并讨论我们的数据集如何反映这些规范。第三，我们展示了一堆法律如何为研究人员提供直接从数据中学习此类过滤规则的机会，从而为基于模型的处理提供了令人兴奋的新研究方向。

translated by 谷歌翻译

A Methodological Framework for the Comparative Evaluation of Multiple Imputation Methods: Multiple Imputation of Race, Ethnicity and Body Mass Index in the U.S. National COVID Cohort Collaborative

Elena Casiraghi , Rachel Wong , Margaret Hall , Ben Coleman , Marco Notaro , Michael D. Evans , Jena S. Tronieri , Hannah Blau , Bryan Laraway , Tiffany J. Callahan

分类：人工智能

2022-06-13

尽管电子健康记录是生物医学研究的丰富数据来源，但这些系统并未在医疗环境中统一地实施，并且由于医疗保健碎片化和孤立的电子健康记录之间缺乏互操作性，可能缺少大量数据。考虑到缺少数据的案例的删除可能会在随后的分析中引起严重的偏见，因此，一些作者更喜欢采用多重插补策略来恢复缺失的信息。不幸的是，尽管几项文献作品已经通过使用现在可以自由研究的任何不同的多个归档算法记录了有希望的结果，但尚无共识，MI算法效果最好。除了选择MI策略之外，归纳算法及其应用程序设置的选择也至关重要且具有挑战性。在本文中，受鲁宾和范布伦的开创性作品的启发，我们提出了一个方法学框架，可以应用于评估和比较多种多个插补技术，旨在选择用于计算临床研究工作中最有效的推断。我们的框架已被应用于验证和扩展较大的队列，这是我们在先前的文献研究中提出的结果，我们在其中评估了关键患者的描述符和Covid-19的影响在2型糖尿病患者中的影响，其数据为2型糖尿病，其数据为2型糖尿病由国家共同队列合作飞地提供。

translated by 谷歌翻译